from pyspark.sql import SparkSession
from pyspark.sql import functions as F
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Web Interaction and Display
from IPython.display import Image, display, HTML
# Additional JavaScript for toggling code display in Jupyter Notebooks
HTML(
"""
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js "></script>
<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.jp-CodeCell > div.jp-Cell-inputWrapper').hide();
} else {
$('div.jp-CodeCell > div.jp-Cell-inputWrapper').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit"
value="Click here to toggle on/off the raw code."></form>
"""
)
ABSTRACT
This study examines the nature and scale of events reported during the first week of October 2016, shortly after Rodrigo Duterte took office as President of the Philippines, using the GDELT Global Knowledge Graph (GKG) dataset. The aim is to create a prototype model for the Philippine judicial system and government authorities to strengthen crime prevention efforts and reduce crime opportunities. The study addresses key questions: the nature and scale of events during this period, comparisons to the previous year, and the situation in the Philippines relative to neighboring Southeast Asian countries. The findings show a significant increase in "KILL" events (60.84% of reports) and notable rises in "ARREST" (18.69%), "KIDNAP" (8.83%), and "AFFECT" (8.53%) reports, with the Philippines ranking second in "KILL" reports among Southeast Asian countries. Future research should include more extensive datasets, incorporate geographical data, and utilize other GDELT datasets for deeper insights, comparing top and bottom countries based on GDP and conducting longitudinal studies to understand the broader implications of Duterte's administration.
INTRODUCTION
Motivation
On the 9th of May in 2016, the Philippines elected a new leader. Rodrigo “Digong” Duterte, then 71 years old, won the presidential election in a landslide victory (34%) on the back of his hardline promises to combat the illegal drug trade in the country. He had built a reputation as an uncompromising, results-oriented mayor of Davao City, where he served for over two decades, and who placed emphasis on law and order over the legal safeguards for alleged offenders.
In his campaigns, Duterte vowed to launch a large-scale anti-drug campaign that is reminiscent of his tenure as mayor of Davao. Indeed, shortly after assuming the presidency in June, he openly encouraged the public to “go ahead and kill” drug addicts (Worley, 2016), which justified vigilante assaults on individuals suspected of involvement in the drug trade or substance abuse. Law enforcement agencies conducted extensive raids and the Philippine National Police (PNP) divulged a list of high-ranking officials and influential figures allegedly complicit in the drug trade, a list later proven to be inadequately substantiated (Suarez, 2016).
Despite ethical concerns surrounding Duterte’s anti-drug campaign, many Filipinos regarded his draconian tactics as necessary to combat a grave social issue. In fact, public support was strong at the end of 2016 with an 85% approval rating—although this decreased slightly to 78% by mid-2018 (Flores, 2018; (Office of the Communications Secretary, 2016).
Duterte’s ability to execute his war on drugs can be attributed to several factors. The premise of this paper rests on two explanations for the success of Duterte’s tactics on the drug war (Philippine Politics Under Duterte: A Midterm Assessment, n.d.):
- A slow and ineffective justice system. Confronted by a combination of sluggish and corrupt judicial system, most Filipinos were willing to tolerate any politician who made promises and delivered on them. This was significantly different from when college students and activists were targeted during the Marcos regime, which caused public outrage. This is because drug dealers and addicts are a stigmatized group, so it is more difficult for them to gain political support for the defense of their rights.
- An indication of government responsiveness. It’s rare for multiple government bodies, including the PNP, national agencies, and local authorities, to collaborate in addressing pressing societal concerns, so Duterte’s whole-of-government approach engendered an increased sense of security among Filipinos.
Problem Statement & Objectives
This paper aims to serve as a prototype of what the Philippine judicial system and government authorities could use as support to strengthen its crime prevention efforts and reduce the opportunities for crime. It concentrates on the first week of October 2016, three months after Duterte assumed office, and seeks to address the following key questions:
- What was the nature and scale of the events reported during the first week of October 2016, particularly those related to Duterte’s war on drugs?
- How did these events compare to the corresponding period in the previous year, before Duterte took office?
- How did the situation in the Philippines during that period compare to neighboring countries in Southeast Asia?
In a hypothetical world where the loyalty of our law enforcement system lies solely with the rule of law and not with any specific administration, the ultimate goal of this paper is to create a model for future administrations to proactively address pressing issues while upholding the rule of law and protecting the rights of all citizens, regardless of their socioeconomic status or alleged involvement in illegal activities. Using data from the Global Database of Events, Language, and Tone (GDELT) Project, we explore whether a data-driven approach could have provided early insights or warnings regarding the potential consequences of Duterte’s policies.
The findings could also lay the groundwork for a global monitoring system that international organizations, such as the United Nations, could use to respond more promptly on global issues. The UN Human Rights Council did not vote to establish an investigation into the alleged crimes during the Duterte administration’s anti-drug campaign until 2019 (Philippines Drugs War: UN Votes to Investigate Killings, 2019). Since 2016, law enforcement has acknowledged at least 6,600 dealer or users killed, but activists argue that more than 27,000 individuals have died.
DATA SOURCES AND DESCRIPTION
The GDELT (Global Data of Events, Language, and Tone) Project is a comprehensive global monitoring system that tracks news content from broadcasters, print media, and websites in numerous languages across almost every country in the world. It employs advanced techniques to identify and extract key elements driving global discourse and events, such as people, locations, organizations, topics, sources, sentiments, numerical data, quotations, images, and events. By continuously processing this immense stream of information on a second-by-second basis, GDELT generates a freely accessible open data platform that enables computational analysis of the world's events, narratives, and societal forces in real-time.
GDELT has several datasets. This study uses the Global Knowledge Graph (GKG) dataset to explore global news coverage. The GKG captures the key dimensions, geographic trends, and connections in the news by using sophisticated natural language processing algorithms. These algorithms generate and encode a wide range of metadata, highlighting the hidden and contextual details in each document. Essentially, the GKG creates a vast network linking people, organizations, locations, numerical data, themes, news sources, and events from around the world. This interconnected network provides insights into global events, the contexts and entities involved, and the sentiments surrounding these events, offering a daily comprehensive view of our global society.
The GKG dataset has the following fields:
| Field Name | Description |
|---|---|
GKGRECORDID |
Each GKG record is assigned a globally unique identifier in a date-oriented serial number format. |
V2.1DATE |
The publication date of the news media used to construct the GKG file, in YYYYMMDDHHMMSS format. |
V2SOURCECOLLECTIONIDENTIFIER |
A numeric identifier specifying the source collection the document came from. |
V2SOURCECOMMONNAME |
A human-friendly identifier of the source of the document. |
V2DOCUMENTIDENTIFIER |
The unique external identifier for the source document. |
V1COUNTS |
A semicolon-delimited list of counts found in the document, each separated by the pound symbol. |
V2.1COUNTS |
Similar to V1COUNTS but includes character offsets for each count. |
V1THEMES |
A semicolon-delimited list of all themes found in the document. |
V2ENHANCEDTHEMES |
Includes all GKG themes referenced in the document along with character offsets. |
V1LOCATIONS |
A semicolon-delimited list of all locations found in the text, extracted using the Leetaru algorithm. |
V2ENHANCEDLOCATIONS |
Similar to V1LOCATIONS but includes an extra field for character offsets and additional details. |
V1PERSONS |
A semicolon-delimited list of all person names found in the text. |
V2ENHANCEDPERSONS |
Includes all person names referenced in the document along with character offsets. |
V1ORGANIZATIONS |
A semicolon-delimited list of all company and organization names found in the text. |
V2ENHANCEDORGANIZATIONS |
Includes all organizations/companies referenced in the document along with character offsets. |
V1.5TONE |
A list of six core emotional dimensions, each recorded as a single precision floating point number. |
V2.1ENHANCEDDATES |
Contains a list of all date references in the document, along with character offsets. |
V2GCAM |
The Global Content Analysis Measures field, runs an array of content analysis systems over each document. |
V2.1SHARINGIMAGE |
Specifies a "sharing image" for each article as specified by news websites. |
V2.1RELATEDIMAGES |
A list of URLs of images deemed most relevant to the core story of the article. |
V2.1SOCIALIMAGEEMBEDS |
A list of URLs of image-based social media posts embedded in articles. |
V2.1SOCIALVIDEOEMBEDS |
A list of URLs of videos embedded in articles from various platforms like YouTube, Vimeo, etc. |
V2.1QUOTATIONS |
Extracts and segments all quoted statements from each article. |
V2.1ALLNAMES |
Contains a list of all proper names referenced in the document, along with character offsets. |
V2.1AMOUNTS |
Contains a list of all precise numeric amounts referenced in the document, along with character offsets. |
V2.1TRANSLATIONINFO |
Records provenance information for machine translated documents. |
V2EXTRASXML |
Reserved to hold special non-standard data applicable to special subsets of the GDELT collection. |
Key Data Columns
This study extracted insights primarily using the following columns: V1COUNTS, V1THEMES, and V1LOCATIONS. Detailed information on V1COUNTS and V1THEMES is provided below.
V1COUNTS¶
Sample Entry:
["ARREST#400#political#4#Rossiya, Orenburgskaya Oblast', Russia#RS#RS55#52.4#54.9833#-2993111;"]
Description:
- The
V1COUNTScolumn contains data representing various categories of counts. The value is derived from the NAME field in the Category List spreadsheet. Common categories includeAFFECT,ARREST,KIDNAP,KILL,PROTEST,SEIZE, andWOUND. Other categories may also appear depending on the context and as new Count categories are introduced over time. - For instance, a value of "PROTEST" in this field would indicate the count of protesters at a protest.
Count Types:
| Count Type | Description |
|---|---|
AFFECT |
Captures a range of impacts including sickness, evacuations, displaced persons, and more. |
ARREST |
Includes mentions of arrests, detentions, imprisonments, and similar actions. |
DISPLACED |
Counts instances of people being displaced; see REFUGEES for mentions of refugees and forced migration. |
EVACUATION |
Mentions of evacuations. |
KIDNAP |
Instances of kidnappings, abductions, and hostage situations. |
KILL |
Any mention of deaths. |
PROTEST |
Discussions of protests, demonstrations, riots, strikes, and related activities. |
REFUGEES |
Counts mentions of refugees, displaced persons, forced migration, and asylum seekers. |
SEIZE |
Instances of seizures, often involving drugs or illegal materials. |
SICKENED |
Counts of mentions related to sickness. |
WOUND |
Any mention of injuries or wounds. |
V1THEMES¶
Sample Entry:
SELF_IDENTIFIED_HUMAN_RIGHTS;FREESPEECH;POLITICAL_PRISONER;PROTEST;MOVEMENT_GENERAL;
Description:
- The
V1THEMEScolumn includes various themes associated with the dataset. A comprehensive description of the 284 themes, including the count types mentioned earlier, can be found here. However, this list is not exhaustive; some themes present in the dataset lack descriptions in the linked file and need to be sourced externally. For example, additional theme information can be found here.
Sample Theme Descriptions:
| Theme | Description |
|---|---|
CORRUPTION |
Mentions of corruption, kickbacks, embezzlement, profiteering, and similar activities. |
CRIME_CARTELS |
Focuses on mentions of drug cartels and drug corridors; does not currently list specific cartels. |
CRIME_COMMON_ROBBERY |
Discussions of general crimes like pickpocketing, robbery, and street crime. |
CRIME_ILLEGAL_DRUGS |
Mentions of illegal drugs. |
CYBER_ATTACK |
Discussions of cyberwarfare, cyberattacks, phishing, hacking, hacktivists, viruses, etc. |
Note that a glossary of the count types and themes mentioned in the results of the paper can be used as a reference for more context. Alternatively, the interpretations also include a comprehensive explanation of these types and themes.
METHODOLOGY
Data Collection
As this paper aims to provide a prototype for a monitoring system, we selected an arbitrary period for exploration, focusing specifically on the first week of October 2016, three months after the Duterte administration began. To substantiate our analysis, we also extracted data from the same period in 2015. This approach allows us to observe any significant changes in the resulting analyses between the two years, attributed to Duterte assuming office.
- Retrieve GDELT's zipped files from Jojie's public directory.
- Transfer the files to a shared directory.
- Extract the zipped files into CSV format for further processing.
Data Preprocessing
- Convert the CSV files to parquet format for efficient processing.
- Read the parquet files into the system.
Note: The actual implementation of the data preprocessing step is found in a supplementary notebook.
Data Cleaning
This step entailed understanding the comprehensive documentation of the dataset and familiarizing with the various codes present in the data. This step involved feature engineering, particularly separating information contained within a single column using regular expressions (Regex) to parse the text accurately. The null and empty values were also handled based on the results of the visualizations. This is because the columns of the GDELT GKG dataset are nested and a combination of string and numerical values, so the handling of these values was done on an as-needed basis.
Exploratory Data Analysis (EDA)
- Extract insights using various data visualization techniques and data wrangling methods in PySpark.
- Utilize tools such as Plotly and Matplotlib to create visual representations of the data.
Actionable Insights
- Recommend actionable steps based on the gathered insights.
Initial Data Exploration
spark = (SparkSession
.builder
.master('local[*]') # Master URL;
.getOrCreate())
df = spark.read.parquet("/home/msds2024/xx/cptx_shared/sltx/bdcc"\
"/201610/parquets")
df_2015 = spark.read.parquet("/home/msds2024/xx/cptx_shared/sltx/"\
"bdcc/201510/parquets")
The 2016 GKG dataset contains a total of 2,087,088 rows, while the 2015 GKG dataset has 1,655,439. Both are structured as follows:
Note: The sample dataframes will be displayed in their transposed form for clarity and to allow the readers to easily see the referenced columns.
def spark_info(df, year=2016):
"""
Display information about a Spark DataFrame including the first five rows,
schema, column data types, and total number of rows.
Parameters
----------
df : pandas.DataFrame
The Spark DataFrame to analyze.
year : int (optional)
The year to include in the output. Defaults to 2016.
"""
# Show the first 5 rows
print(f"\nFirst 5 rows ({year}):")
display(df.limit(5).toPandas().transpose())
# Print the schema
print(f"Schema: {year}")
df.printSchema()
# Get column data types
print(f"\nData types: {year}")
for col in df.dtypes:
print(f"{col[0]}: {col[1]}")
# Count total number of rows
row_count = df.count()
print(f"\nTotal number of rows ({year}): {row_count}", end='\n\n')
print('-'*50)
spark_info(df_2015, year=2015)
spark_info(df, year=2016)
First 5 rows (2015):
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| GKGRECORDID | 20151008160000-0 | 20151008160000-1 | 20151008160000-2 | 20151008160000-3 | 20151008160000-4 |
| V2.1DATE | 20151008160000 | 20151008160000 | 20151008160000 | 20151008160000 | 20151008160000 |
| V2SOURCECOLLECTIONIDENTIFIER | 2 | 2 | 2 | 2 | 2 |
| V2SOURCECOMMONNAME | BBC Monitoring | BBC Monitoring | BBC Monitoring | BBC Monitoring | BBC Monitoring |
| V2DOCUMENTIDENTIFIER | /BBC Monitoring/(c) BBC | Interfax-AVN military news agency, Moscow/BBC ... | MK (Moskovskiy Komsomolets) website, Moscow/BB... | Novosti Pridnestrovya website, Tiraspol/BBC Mo... | BelaPAN news agency, Minsk/BBC Monitoring/(c) BBC |
| V1COUNTS | None | None | None | None | None |
| V2.1COUNTS | None | None | None | None | None |
| V1THEMES | TAX_ETHNICITY;TAX_ETHNICITY_RUSSIAN;TAX_WORLDL... | TAX_ETHNICITY;TAX_ETHNICITY_RUSSIAN;TAX_WORLDL... | TAX_ETHNICITY;TAX_ETHNICITY_RUSSIAN;TAX_WORLDL... | TAX_ETHNICITY;TAX_ETHNICITY_RUSSIAN;TAX_WORLDL... | TAX_ETHNICITY;TAX_ETHNICITY_BELARUSIAN;TAX_FNC... |
| V2ENHANCEDTHEMES | TAX_FNCACT_LEADERS,753;WB_165_AIR_TRANSPORT,32... | WB_566_ENVIRONMENT_AND_NATURAL_RESOURCES,1338;... | TAX_FNCACT_TROOPS,201;TAX_ETHNICITY_RUSSIAN,14... | SOVEREIGNTY,812;SOVEREIGNTY,868;WB_2433_CONFLI... | GENERAL_GOVERNMENT,927;GENERAL_GOVERNMENT,2017... |
| V1LOCATIONS | 4#Sochi, Krasnodarskiy Kray, Russia#RS#RS38#43... | 1#Syria#SY#SY#35#38#SY;1#Russia#RS#RS#60#100#R... | 4#Moscow, Moskva, Russia#RS#RS48#55.7522#37.61... | 4#Tiraspol, StîA Nistrului, Moldova#MD#MD58#46... | 4#Minsk, Belarus (General), Belarus#BO#BO00#53... |
| V2ENHANCEDLOCATIONS | 4#Sochi, Krasnodarskiy Kray, Russia#RS#RS38#25... | 1#Russian#RS#RS##60#100#RS#11;1#Russian#RS#RS#... | 1#Russia#RS#RS##60#100#RS#444;1#Russia#RS#RS##... | 1#Russia#RS#RS##60#100#RS#190;1#Russia#RS#RS##... | 1#Belarus#BO#BO##53#28#BO#14;1#Belarus#BO#BO##... |
| V1PERSONS | vladimir putin;emomali rahmon | khan el-asal;maj-gen igor konashenkov | novyye vedomosti;igor smirnov;petro poroshenko... | vladimir vladimirovich;vladimir putin | mikalay statkevich;alyaksandr lukashenka;yury ... |
| V2ENHANCEDPERSONS | Vladimir Putin,580;Emomali Rahmon,653 | Khan El-Asal,648;Maj-Gen Igor Konashenkov,506 | Novyye Vedomosti,5118;Igor Smirnov,3194;Petro ... | Vladimir Vladimirovich,1643;Vladimir Putin,128... | Mikalay Statkevich,390;Alyaksandr Lukashenka,1... |
| V1ORGANIZATIONS | russian defence ministry;international news ag... | russian defence ministry;international news ag... | lukoil;sberbank | european union;russian federation | movement for the statehood;movement for freedo... |
| V2ENHANCEDORGANIZATIONS | Russian Defence Ministry,28;Russian Defence Mi... | Russian Defence Ministry,28;Russian Defence Mi... | Lukoil,683;Lukoil,983;Lukoil,1153;Sberbank,158... | European Union,844;Russian Federation,1741 | Movement For The Statehood,313;Movement For Fr... |
| V1.5TONE | -1.69491525423729,0.847457627118644,2.54237288... | -3.87931034482759,1.72413793103448,5.603448275... | -0.622775800711743,3.11387900355872,3.73665480... | -1.17994100294985,2.3598820058997,3.5398230088... | 0,2.33236151603499,2.33236151603499,4.66472303... |
| V2.1ENHANCEDDATES | None | None | 1#0#0#1992#264 | None | 4#9#11#0#2152 |
| V2GCAM | wc:104,c12.1:3,c12.10:4,c12.12:1,c12.13:2,c12.... | wc:204,c12.1:10,c12.10:17,c12.12:12,c12.13:3,c... | wc:1063,c1.2:11,c1.3:2,c12.1:71,c12.10:131,c12... | wc:307,c1.2:1,c12.1:11,c12.10:24,c12.12:5,c12.... | wc:321,c12.1:11,c12.10:28,c12.12:3,c12.13:13,c... |
| V2.1SHARINGIMAGE | None | None | None | None | None |
| V2.1RELATEDIMAGES | None | None | None | None | None |
| V2.1SOCIALIMAGEEMBEDS | None | None | None | None | None |
| V2.1SOCIALVIDEOEMBEDS | None | None | None | None | None |
| V2.1QUOTATIONS | None | None | None | 216|29||25 years together with Russia | None |
| V2.1ALLNAMES | Russian Defence Ministry,29;Russian Defence Mi... | Russian Defence Ministry,29;Russian Defence Mi... | Dniester River,396;Even Romania,1126;Russian D... | Russian President Vladimir,128;Dniester Moldov... | Andrea Wiktorin,41;European Union,81;Anatol Ly... |
| V2.1AMOUNTS | 2,leaders,640; | 22,sorties,1063;27,terrorist targets,1083; | None | 2,states,938; | None |
| V2.1TRANSLATIONINFO | None | None | None | None | None |
| V2EXTRASXML | None | None | None | None | None |
Schema: 2015 root |-- GKGRECORDID: string (nullable = true) |-- V2.1DATE: long (nullable = true) |-- V2SOURCECOLLECTIONIDENTIFIER: integer (nullable = true) |-- V2SOURCECOMMONNAME: string (nullable = true) |-- V2DOCUMENTIDENTIFIER: string (nullable = true) |-- V1COUNTS: string (nullable = true) |-- V2.1COUNTS: string (nullable = true) |-- V1THEMES: string (nullable = true) |-- V2ENHANCEDTHEMES: string (nullable = true) |-- V1LOCATIONS: string (nullable = true) |-- V2ENHANCEDLOCATIONS: string (nullable = true) |-- V1PERSONS: string (nullable = true) |-- V2ENHANCEDPERSONS: string (nullable = true) |-- V1ORGANIZATIONS: string (nullable = true) |-- V2ENHANCEDORGANIZATIONS: string (nullable = true) |-- V1.5TONE: string (nullable = true) |-- V2.1ENHANCEDDATES: string (nullable = true) |-- V2GCAM: string (nullable = true) |-- V2.1SHARINGIMAGE: string (nullable = true) |-- V2.1RELATEDIMAGES: string (nullable = true) |-- V2.1SOCIALIMAGEEMBEDS: string (nullable = true) |-- V2.1SOCIALVIDEOEMBEDS: string (nullable = true) |-- V2.1QUOTATIONS: string (nullable = true) |-- V2.1ALLNAMES: string (nullable = true) |-- V2.1AMOUNTS: string (nullable = true) |-- V2.1TRANSLATIONINFO: string (nullable = true) |-- V2EXTRASXML: string (nullable = true) Data types: 2015 GKGRECORDID: string V2.1DATE: bigint V2SOURCECOLLECTIONIDENTIFIER: int V2SOURCECOMMONNAME: string V2DOCUMENTIDENTIFIER: string V1COUNTS: string V2.1COUNTS: string V1THEMES: string V2ENHANCEDTHEMES: string V1LOCATIONS: string V2ENHANCEDLOCATIONS: string V1PERSONS: string V2ENHANCEDPERSONS: string V1ORGANIZATIONS: string V2ENHANCEDORGANIZATIONS: string V1.5TONE: string V2.1ENHANCEDDATES: string V2GCAM: string V2.1SHARINGIMAGE: string V2.1RELATEDIMAGES: string V2.1SOCIALIMAGEEMBEDS: string V2.1SOCIALVIDEOEMBEDS: string V2.1QUOTATIONS: string V2.1ALLNAMES: string V2.1AMOUNTS: string V2.1TRANSLATIONINFO: string V2EXTRASXML: string Total number of rows (2015): 1655439 -------------------------------------------------- First 5 rows (2016):
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| GKGRECORDID | 20161006153000-0 | 20161006153000-1 | 20161006153000-2 | 20161006153000-3 | 20161006153000-4 |
| V2.1DATE | 20161006153000 | 20161006153000 | 20161006153000 | 20161006153000 | 20161006153000 |
| V2SOURCECOLLECTIONIDENTIFIER | 2 | 2 | 2 | 2 | 1 |
| V2SOURCECOMMONNAME | BBC Monitoring | BBC Monitoring | BBC Monitoring | BBC Monitoring | ap.org |
| V2DOCUMENTIDENTIFIER | Islamic Republic News Agency, Tehran/BBC Monit... | Rzeczpospolita in Polish /BBC Monitoring/(c) BBC | News/BBC Monitoring/(c) BBC | Interfax-Ukraine news agency, Kiev/BBC Monitor... | http://bigstory.ap.org/article/dade98ba770f4e4... |
| V1COUNTS | None | None | None | None | AFFECT#2#counties#2#Florida, United States#US#... |
| V2.1COUNTS | None | None | None | None | AFFECT#2#counties#2#Florida, United States#US#... |
| V1THEMES | WB_2433_CONFLICT_AND_VIOLENCE;WB_2449_PARAMILI... | TAX_ETHNICITY;TAX_ETHNICITY_BLACK;TAX_TERROR_G... | LEADER;TAX_FNCACT;TAX_FNCACT_PRESIDENT;USPEC_P... | MEDIA_MSM;CRISISLEX_CRISISLEXREC;TAX_ETHNICITY... | TAX_WORLDMAMMALS;TAX_WORLDMAMMALS_CAT;NATURAL_... |
| V2ENHANCEDTHEMES | WB_2433_CONFLICT_AND_VIOLENCE,44;WB_2449_PARAM... | GENERAL_GOVERNMENT,568;GENERAL_GOVERNMENT,1609... | TAX_FNCACT_CANDIDATES,564;TAX_FNCACT_JUDGES,61... | WB_678_DIGITAL_GOVERNMENT,1663;WB_694_BROADCAS... | WATER_SECURITY,11267;WATER_SECURITY,11373;SHOR... |
| V1LOCATIONS | None | None | 4#Pretoria, Gauteng, South Africa#SF#SF06#-25.... | 4#Kiev, Ukraine (General), Ukraine#UP#UP00#50.... | 3#Charleston, South Carolina, United States#US... |
| V2ENHANCEDLOCATIONS | None | None | 1#South Africans#SF#SF##-29#24#SF#592;4#Pretor... | 1#Ukrainian#UP#UP##49#32#UP#78;1#Ukrainian#UP#... | 3#Orlando, Florida, United States#US#USFL#FL09... |
| V1PERSONS | abdolhossein imani | jaroslaw kaczynski | muvhango lukhaimane;thulisile madonsela;sharis... | petro poroshenko | laura axelsen;john guthrie;ray gohill;nikki ha... |
| V2ENHANCEDPERSONS | Abdolhossein Imani,260 | Jaroslaw Kaczynski,456;Jaroslaw Kaczynski,1374... | Muvhango Lukhaimane,698;Thulisile Madonsela,41... | Petro Poroshenko,105;Petro Poroshenko,510 | Laura Axelsen,8279;John Guthrie,13870;Ray Gohi... |
| V1ORGANIZATIONS | iran basij paramilitary;islamic republic news ... | None | national assembly on;security agency;national ... | national police of ukraine;office of ukraine;s... | orlando international airport;u s national hur... |
| V2ENHANCEDORGANIZATIONS | Iran Basij Paramilitary,44;Islamic Republic Ne... | None | National Assembly On,803;Security Agency,977;N... | National Police Of Ukraine,1272;Office Of Ukra... | Law Enforcement Division,5563;Law Enforcement ... |
| V1.5TONE | -3.27868852459016,0.819672131147541,4.09836065... | -5.32544378698225,4.14201183431953,9.467455621... | 1.73160173160173,4.76190476190476,3.0303030303... | -4.27807486631016,1.8716577540107,6.1497326203... | -4.40702257255464,1.21820136151917,5.625223934... |
| V2.1ENHANCEDDATES | None | 4#10#3#0#85;4#10#5#0#1202 | 4#9#7#0#815;4#10#14#0#1387 | 4#9#16#0#970 | 1#0#0#2005#8146 |
| V2GCAM | wc:110,c12.1:4,c12.10:8,c12.12:4,c12.13:1,c12.... | wc:300,c1.2:1,c12.1:30,c12.10:44,c12.11:2,c12.... | wc:215,c12.1:15,c12.10:11,c12.12:3,c12.13:3,c1... | wc:265,c12.1:9,c12.10:37,c12.11:3,c12.12:13,c1... | wc:2722,c1.3:1,c12.1:92,c12.10:218,c12.12:105,... |
| V2.1SHARINGIMAGE | None | None | None | None | http://binaryapi.ap.org/09272a40e4a0481abe3ca5... |
| V2.1RELATEDIMAGES | None | None | None | None | None |
| V2.1SOCIALIMAGEEMBEDS | None | None | None | None | None |
| V2.1SOCIALVIDEOEMBEDS | None | None | None | None | None |
| V2.1QUOTATIONS | None | None | None | 1140|133||who previously carried out the terro... | None |
| V2.1ALLNAMES | Basij Paramilitary Force,52;Ila Bayt,85;Toward... | Ordo Iuris,405;Jaroslaw Kaczynski,475;Polish E... | President Jacob Zuma,33;Advocate Busisiwe Joyc... | Ukrainian President Petro Poroshenko,112;Inter... | Hurricane Matthew,127;Tropical Storm Nicole,15... |
| V2.1AMOUNTS | None | None | 60,candidates nominated by South,480;263,votes... | None | 4,hurricane MIAMI,35;4,storm,188;2,counties al... |
| V2.1TRANSLATIONINFO | None | None | None | None | None |
| V2EXTRASXML | None | None | None | None | <PAGE_LINKS>http://bit.ly/2d5oIg4</PAGE_LINKS> |
Schema: 2016 root |-- GKGRECORDID: string (nullable = true) |-- V2.1DATE: long (nullable = true) |-- V2SOURCECOLLECTIONIDENTIFIER: integer (nullable = true) |-- V2SOURCECOMMONNAME: string (nullable = true) |-- V2DOCUMENTIDENTIFIER: string (nullable = true) |-- V1COUNTS: string (nullable = true) |-- V2.1COUNTS: string (nullable = true) |-- V1THEMES: string (nullable = true) |-- V2ENHANCEDTHEMES: string (nullable = true) |-- V1LOCATIONS: string (nullable = true) |-- V2ENHANCEDLOCATIONS: string (nullable = true) |-- V1PERSONS: string (nullable = true) |-- V2ENHANCEDPERSONS: string (nullable = true) |-- V1ORGANIZATIONS: string (nullable = true) |-- V2ENHANCEDORGANIZATIONS: string (nullable = true) |-- V1.5TONE: string (nullable = true) |-- V2.1ENHANCEDDATES: string (nullable = true) |-- V2GCAM: string (nullable = true) |-- V2.1SHARINGIMAGE: string (nullable = true) |-- V2.1RELATEDIMAGES: string (nullable = true) |-- V2.1SOCIALIMAGEEMBEDS: string (nullable = true) |-- V2.1SOCIALVIDEOEMBEDS: string (nullable = true) |-- V2.1QUOTATIONS: string (nullable = true) |-- V2.1ALLNAMES: string (nullable = true) |-- V2.1AMOUNTS: string (nullable = true) |-- V2.1TRANSLATIONINFO: string (nullable = true) |-- V2EXTRASXML: string (nullable = true) Data types: 2016 GKGRECORDID: string V2.1DATE: bigint V2SOURCECOLLECTIONIDENTIFIER: int V2SOURCECOMMONNAME: string V2DOCUMENTIDENTIFIER: string V1COUNTS: string V2.1COUNTS: string V1THEMES: string V2ENHANCEDTHEMES: string V1LOCATIONS: string V2ENHANCEDLOCATIONS: string V1PERSONS: string V2ENHANCEDPERSONS: string V1ORGANIZATIONS: string V2ENHANCEDORGANIZATIONS: string V1.5TONE: string V2.1ENHANCEDDATES: string V2GCAM: string V2.1SHARINGIMAGE: string V2.1RELATEDIMAGES: string V2.1SOCIALIMAGEEMBEDS: string V2.1SOCIALVIDEOEMBEDS: string V2.1QUOTATIONS: string V2.1ALLNAMES: string V2.1AMOUNTS: string V2.1TRANSLATIONINFO: string V2EXTRASXML: string Total number of rows (2016): 2087088 --------------------------------------------------
Data Cleaning
As previously mentioned in the Data Sources and Description section, the GKG dataset has a nested or hierarchical structure where some columns contain multiple pieces of information separated by delimiters. This necessitates the need to denormalize or unnest these nested information elements to capture more comprehensive and granular insights.
In particular, we explode the following columns:
V1COUNTS: To extract the count types (KILL,ARREST,PROTEST, etc.)V1LOCATIONS: To extract the region, country, longitude, and latitudeV1THEMES: To extract all the themes of every discussionV1PERSONS: To extract all the mentioned persons in every document
The resulting dataframes for 2015 and 2016 after the explosion and removal of unneeded columns have the following structure:
df_final = (df
.withColumn("CountType", F.regexp_extract("V1COUNTS", r"^([^#]+)",
1))
.withColumn("ExpandedLocation",
F.explode(F.split(
F.regexp_extract("V1LOCATIONS",
r"(^|;)([^#]+#[^#]+#[^#]+)",
0), ";")))
.withColumn("ExtractedLocation",
F.regexp_extract("ExpandedLocation", r"#([^#]+)#", 1))
.withColumn("Region",
F.element_at(F.split(F.col("ExtractedLocation"),", "),
1))
.withColumn("Country",
F.element_at(F.split(F.col("ExtractedLocation"), ", "),
-1))
.withColumn("Region", F.when(F.col("Region") == F.col("Country"),
None).otherwise(F.col("Region")))
.withColumn("Themes_List", F.split(F.col("V1THEMES"), ";"))
.withColumn("Persons_List", F.split(F.col("V1PERSONS"), ";"))
.drop("ExpandedLocation", "ExtractedLocation",
"V2.1DATE", "V2.1ENHANCEDDATES", "V2GCAM",
"V2.1SHARINGIMAGE", "V2.1RELATEDIMAGES",
"V2.1SOCIALIMAGEEMBEDS", "V2.1SOCIALVIDEOEMBEDS",
"V2.1QUOTATIONS", "V2.1ALLNAMES", "V2.1AMOUNTS",
"V2.1TRANSLATIONINFO", "V2EXTRASXML")
)
df_final_2015 = (df_2015
.withColumn("CountType", F.regexp_extract("V1COUNTS",
r"^([^#]+)", 1))
.withColumn("ExpandedLocation",
F.explode(F.split(
F.regexp_extract("V1LOCATIONS",
r"(^|;)([^#]+#[^#]+#[^#]+)",
0), ";")))
.withColumn("ExtractedLocation",
F.regexp_extract("ExpandedLocation",
r"#([^#]+)#", 1))
.withColumn("Region",
F.element_at(F.split(F.col("ExtractedLocation"),
", "), 1))
.withColumn("Country",
F.element_at(F.split(F.col("ExtractedLocation"),
", "), -1))
.withColumn("Region",
F.when(F.col("Region") == F.col("Country"),
None).otherwise(F.col("Region")))
.withColumn("Themes_List", F.split(F.col("V1THEMES"), ";"))
.withColumn("Persons_List", F.split(F.col("V1PERSONS"), ";"))
.drop("ExpandedLocation", "ExtractedLocation",
"V2.1DATE", "V2.1ENHANCEDDATES", "V2GCAM",
"V2.1SHARINGIMAGE", "V2.1RELATEDIMAGES",
"V2.1SOCIALIMAGEEMBEDS", "V2.1SOCIALVIDEOEMBEDS",
"V2.1QUOTATIONS", "V2.1ALLNAMES", "V2.1AMOUNTS",
"V2.1TRANSLATIONINFO", "V2EXTRASXML")
)
display(df_final.limit(5).toPandas().transpose())
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| GKGRECORDID | 20161006153000-2 | 20161006153000-3 | 20161006153000-4 | 20161006153000-6 | 20161006153000-8 |
| V2SOURCECOLLECTIONIDENTIFIER | 2 | 2 | 1 | 1 | 1 |
| V2SOURCECOMMONNAME | BBC Monitoring | BBC Monitoring | ap.org | wharfedaleobserver.co.uk | siouxcityjournal.com |
| V2DOCUMENTIDENTIFIER | News/BBC Monitoring/(c) BBC | Interfax-Ukraine news agency, Kiev/BBC Monitor... | http://bigstory.ap.org/article/dade98ba770f4e4... | http://www.wharfedaleobserver.co.uk/news/natio... | http://siouxcityjournal.com/business/average-u... |
| V1COUNTS | None | None | AFFECT#2#counties#2#Florida, United States#US#... | ARREST#5#teenagers#1#Poland#PL#PL#52#20#PL;SOC... | None |
| V2.1COUNTS | None | None | AFFECT#2#counties#2#Florida, United States#US#... | ARREST#5#teenagers#1#Poland#PL#PL#52#20#PL#0;S... | None |
| V1THEMES | LEADER;TAX_FNCACT;TAX_FNCACT_PRESIDENT;USPEC_P... | MEDIA_MSM;CRISISLEX_CRISISLEXREC;TAX_ETHNICITY... | TAX_WORLDMAMMALS;TAX_WORLDMAMMALS_CAT;NATURAL_... | ARREST;SOC_GENERALCRIME;CRISISLEX_C07_SAFETY;T... | WB_336_NON_BANK_FINANCIAL_INSTITUTIONS;WB_1920... |
| V2ENHANCEDTHEMES | TAX_FNCACT_CANDIDATES,564;TAX_FNCACT_JUDGES,61... | WB_678_DIGITAL_GOVERNMENT,1663;WB_694_BROADCAS... | WATER_SECURITY,11267;WATER_SECURITY,11373;SHOR... | SEIZE,1098;TAX_FNCACT_INSPECTOR,642;WB_2024_AN... | GENERAL_GOVERNMENT,553;EPU_POLICY_GOVERNMENT,5... |
| V1LOCATIONS | 4#Pretoria, Gauteng, South Africa#SF#SF06#-25.... | 4#Kiev, Ukraine (General), Ukraine#UP#UP00#50.... | 3#Charleston, South Carolina, United States#US... | 1#Poland#PL#PL#52#20#PL;5#Essex, Essex, United... | 3#Washington, Washington, United States#US#USD... |
| V2ENHANCEDLOCATIONS | 1#South Africans#SF#SF##-29#24#SF#592;4#Pretor... | 1#Ukrainian#UP#UP##49#32#UP#78;1#Ukrainian#UP#... | 3#Orlando, Florida, United States#US#USFL#FL09... | 1#Polish#PL#PL##52#20#PL#35;5#Essex, Essex, Un... | 3#Washington, Washington, United States#US#USD... |
| V1PERSONS | muvhango lukhaimane;thulisile madonsela;sharis... | petro poroshenko | laura axelsen;john guthrie;ray gohill;nikki ha... | shelley wright;arkadiusz jozwik;arek jozwik | freddie mac |
| V2ENHANCEDPERSONS | Muvhango Lukhaimane,698;Thulisile Madonsela,41... | Petro Poroshenko,105;Petro Poroshenko,510 | Laura Axelsen,8279;John Guthrie,13870;Ray Gohi... | Shelley Wright,1288;Arkadiusz Jozwik,227;Arek ... | Freddie Mac,210 |
| V1ORGANIZATIONS | national assembly on;security agency;national ... | national police of ukraine;office of ukraine;s... | orlando international airport;u s national hur... | None | None |
| V2ENHANCEDORGANIZATIONS | National Assembly On,803;Security Agency,977;N... | National Police Of Ukraine,1272;Office Of Ukra... | Law Enforcement Division,5563;Law Enforcement ... | None | None |
| V1.5TONE | 1.73160173160173,4.76190476190476,3.0303030303... | -4.27807486631016,1.8716577540107,6.1497326203... | -4.40702257255464,1.21820136151917,5.625223934... | -5.78034682080925,1.44508670520231,7.225433526... | 0.694444444444444,1.38888888888889,0.694444444... |
| CountType | None | None | AFFECT | ARREST | None |
| Region | Pretoria | Kiev | Charleston | None | Washington |
| Country | South Africa | Ukraine | United States | Poland | United States |
| Themes_List | [LEADER, TAX_FNCACT, TAX_FNCACT_PRESIDENT, USP... | [MEDIA_MSM, CRISISLEX_CRISISLEXREC, TAX_ETHNIC... | [TAX_WORLDMAMMALS, TAX_WORLDMAMMALS_CAT, NATUR... | [ARREST, SOC_GENERALCRIME, CRISISLEX_C07_SAFET... | [WB_336_NON_BANK_FINANCIAL_INSTITUTIONS, WB_19... |
| Persons_List | [muvhango lukhaimane, thulisile madonsela, sha... | [petro poroshenko] | [laura axelsen, john guthrie, ray gohill, nikk... | [shelley wright, arkadiusz jozwik, arek jozwik] | [freddie mac] |
Additionally, we created two separate dataframes: one containing discussions specific to the Philippines and another encompassing its neighboring countries in Southeast Asia (SEA). This approach enables us to provide deeper context and substance to insights specific to the Philippines by considering regional dynamics and discussions in nearby countries.
# Filtering the DataFrame for SEA countries
sea_countries = ["Brunei", "Cambodia", "Indonesia", "Laos", "Malaysia",
"Myanmar", "Philippines", "Singapore", "Thailand", "Vietnam",
"East Timor"]
# 2016 dataframes
df_sea_ph = (df_final
.filter(F.col('Country')
.isin(sea_countries))
)
df_ph = (df_sea_ph
.filter(F.col('Country')
.rlike('Philippines')
)
)
df_sea = (df_sea_ph
.filter(~F.col('Country')
.rlike('Philippines')
)
)
# 2015 dataframes
df_sea_ph_2015 = (df_final_2015
.filter(F.col('Country')
.isin(sea_countries))
)
df_ph_2015 = (df_sea_ph_2015
.filter(F.col('Country')
.rlike('Philippines')
)
)
df_sea_2015 = (df_sea_ph_2015
.filter(~F.col('Country')
.rlike('Philippines')
)
)
# print("Dataframe of the Philippines:")
# display(df_ph.limit(1).toPandas().transpose())
# print("\nDataframe of the neighboring countries in SEA:")
# display(df_sea.limit(1).toPandas().transpose())
EXPLORATORY DATA ANALYSIS
How many distinct counts were made for each type, such as kill, arrest, protest, etc. in the Philippines?
The data paints a potentially concerning picture of the documents in the Philippines during the first week of October 2016 (Figure 2), shortly after the new Duterte administration took office in the Philippines.
The most striking observation from the data is the extraordinarily high number of reports related to "KILL" events, which stands at 60.84% of the total reports. This suggests that there was a significant amount of coverage on killings or deaths during that particular week.
Other notable events reported include "ARREST" (18.69%), "KIDNAP" (8.83% reports), and "AFFECT" (8.53%), indicating a considerable emphasis on law enforcement actions, abductions, and incidents with broader impacts.
result = (df_ph
.filter(F.col("CountType").isNotNull())
.groupBy("CountType")
.count()
.orderBy("count", ascending=False)
.limit(5)
.toPandas()
)
# Sort the DataFrame by count in ascending order
result_asc = result.sort_values(by="count", ascending=True)
# Calculate the total count
total_count = result_asc["count"].sum()
# Calculate the percentage for each count
result_asc["percentage"] = (result_asc["count"] / total_count) * 100
# Extract the max percentage to highlight the corresponding bar
max_percentage = result_asc["percentage"].max()
# Define colors for each bar, highlighting the highest one
colors = ["#C8C5C5" if percentage != max_percentage else '#880808'
for percentage in result_asc["percentage"]]
# Initialize the figure
fig = go.Figure()
# Add a horizontal bar trace with dynamic colors
fig.add_trace(
go.Bar(
y=result_asc["CountType"],
x=result_asc["percentage"],
marker=dict(color=colors),
text=result_asc["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation='h'
)
)
# Set titles and labels
fig.update_layout(
title={
'text': (
"A Week in Review: What Happened in Early October 2016?<br>"
"<sub>The first week of October 2016, shortly after Rodrigo "
"Duterte took office, saw many reports of killings,<br>"
"casting a spotlight on a potential pressing issue.</sub>"
),
'y':0.95, # Adjusts the vertical alignment of the title
'x':0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20}
},
title_font_size=16,
title_y=0.95, # Adjust the title position downwards to create more space
xaxis_title="Percentage",
yaxis_title="Count Type",
margin=dict(l=50, r=50, t=100, b=50),
width=900,
height=400,
template="plotly_white"
)
# Show the plot
fig.show()
However, these figures will remain a mere theory without more context. The GDELT Project describes the KILL count type as any mention of something dying. This could render our analysis limited and premature without more information on what the KILL-type documents, or documents of any other type, are all about.
Lucky for us, the GKG dataset includes information on the themes of the reports, and it's quite comprehensive too.
Which themes are associated with documents that have a KILL count type?
In the dataset, a single document can be tagged with multiple themes. For example, consider this report from the GKG dataset (Catholic News Service, 2016), which is classified under the "KILL" count type.
This document is tagged with themes such as:
DRUG_TRADE;WB_1331_HEALTH_TECHNOLOGIES;WB_2453_ORGANIZED_CRIME;WB_1350_PHARMACEUTICALS;WB_2433_CONFLICT_AND_VIOLENCE;WB_621_HEALTH_NUTRITION_AND_POPULATION;WB_2432_FRAGILITY_CONFLICT_AND_VIOLENCE;WB_2456_DRUGS_AND_NARCOTICS;ARMEDCONFLICT;
Even without a detailed review of the article itself, it's clear that the primary focus is on drugs and violence. In fact, the predominant themes of "KILL"-related documents are about casualties, fragility, conflict, and violence (Figure 4).
result = (df_ph
.withColumn('Theme', F.explode('Themes_List'))
.filter((F.col('CountType') == 'KILL') &
(F.col('Country') == 'Philippines') &
(~F.col('Theme').startswith('TAX')) &
(F.col('Theme') != 'KILL') &
(F.col('Theme') != '')
)
.groupBy('Theme')
.count()
.orderBy('count', ascending=False)
.limit(10).toPandas()
)
# result
# Calculate the total count
total_count = result["count"].sum()
# Calculate the percentage for each count
result["percentage"] = (result["count"] / total_count) * 100
# Sort the DataFrame by count in ascending order
result_sorted = result.sort_values(by="count", ascending=True)
# Extract the max percentage to highlight the corresponding bar
max_percentage = result_sorted["percentage"].max()
# Define colors for each bar, highlighting the highest one
colors = ["#C8C5C5" if percentage != max_percentage else '#880808'
for percentage in result_sorted["percentage"]]
# Create the bar graph with Plotly in horizontal orientation
fig = go.Figure(
go.Bar(
x=result_sorted['percentage'], # Set the percentages on the x-axis
y=result_sorted['Theme'], # Set the themes on the y-axis
text=result_sorted['percentage'].apply(lambda x: f"{x:.2f}%"),
textposition='auto',
orientation='h', # Make the bar chart horizontal
marker=dict(color=colors) # Customize the colors
)
)
# Customize the layout
fig.update_layout(
title={
'text': (
"Top 10 Themes Associated with KILL counts in the Philippines "
"(2016)<br><sub>During the first week of October 2016, following "
"the onset of the Duterte administration,<br> themes of violence "
"and crime were particularly prominent</sub>"
),
'y': 0.95, # Adjusts the vertical alignment of the title
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20}
},
xaxis_title="Percentage",
yaxis_title="Themes",
template="plotly_white",
margin=dict(l=50, r=50, t=100, b=50),
width=1000, # Adjust the width of the figure
height=600
)
# Show the plot
fig.show()
At this early stage, it is important to note that this high volume of "KILL"-related documents, or of any other type, is only potentially concerning for several reasons:
Media Bias and Reporting Trends: The data may reflect an increase in media reporting on certain types of events rather than an actual increase in the events themselves. Media outlets might have chosen to focus more on violent incidents due to heightened public interest on the Duterte administration or other editorial decisions.
Data Limitations: The dataset captures what is reported in the digital space, not necessarily what occurs. There may be underreporting or overreporting in certain areas, and the data might not fully represent the on-ground realities. This is precisely the reason why this methodology should only be used to support the government's actual findings. Ideally, law enforcement agencies should have a separate and much more comprehensive monitoring system that would validate the outputs of the would-be dashboard based on this study.
Contextual Factors: To address this uncertainty, the analysis should ideally extend to a wider time period to establish a trend attributable to the onset of the Duterte administration. However, due to current computational constraints, focusing on a specific week within the Duterte regime and comparing it to the same period in the previous year should suffice for now.
Comparative Baseline: Building on the previous point, to fully understand the significance of these numbers, they need to be compared against a baseline. Without comparing these figures to previous weeks, months, or similar periods in past years, it is difficult to assess whether there has been a significant change in the number of "KILL" events, or events of any other type. In our case, we compare our analyses to that the of the previous year (2015) during the same week.
How do the numbers compare to the corresponding counts from the same period in the previous year (2015)?
As it turns out, when compared to the same week of last year (2015), the number of killings almost doubled three months after the Duterte Administration began (Figure 5). The rise in the number of arrests could also be indicative of more aggressive law enforcement or changes in policing strategies.
result_2015 = (df_ph_2015
.filter(F.col("CountType").isNotNull())
.groupBy("CountType")
.count()
.orderBy("count", ascending=False)
.limit(5)
.toPandas()
)
# Calculate total counts
total_count_2015 = result_2015["count"].sum()
total_count_2016 = result["count"].sum()
# Calculate percentages
result_2015["percentage"] = (result_2015["count"] / total_count_2015) * 100
result["percentage"] = (result["count"] / total_count_2016) * 100
# Sort data by count
result_asc_2015 = result_2015.sort_values(by="count", ascending=True)
result_asc_2016 = result.sort_values(by="count", ascending=True)
# Extract max percentages
max_percentage_2015 = result_asc_2015["percentage"].max()
max_percentage_2016 = result_asc_2016["percentage"].max()
# Define colors
colors_2015 = ["#C8C5C5" if percentage != max_percentage_2015 else '#000000'
for percentage in result_asc_2015["percentage"]]
colors_2016 = ["#C8C5C5" if percentage != max_percentage_2016 else '#880808'
for percentage in result_asc_2016["percentage"]]
# Create subplots
fig = make_subplots(rows=1, cols=2,
subplot_titles=("1st Week of October 2015",
"1st Week of October 2016"))
# Add the 2015 bar chart to the first subplot
fig.add_trace(
go.Bar(
y=result_asc_2015["CountType"],
x=result_asc_2015["percentage"],
marker=dict(color=colors_2015),
text=result_asc_2015["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation='h',
showlegend=False
),
row=1, col=1
)
# Add the 2016 bar chart to the second subplot
fig.add_trace(
go.Bar(
y=result_asc_2016["CountType"],
x=result_asc_2016["percentage"],
marker=dict(color=colors_2016),
text=result_asc_2016["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation='h',
showlegend=False
),
row=1, col=2
)
# Update layout properties for both subplots
fig.update_layout(
title={
'text': (
"A Week in Review: What Happened in Early October 2015 vs. "
"October 2016?<br>"
"<sub>The number of killings in 2016 almost doubled compared to "
"2015 three months after the Duterte administration began.</sub>"
),
'y': 0.95, # Adjusts the vertical alignment of the title
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20} # Adjusts the padding below the title
},
title_font_size=16,
xaxis_title="Percentage",
yaxis_title="Count Type",
width=1000,
height=400,
template="plotly_white",
annotations=[
{
"font": {"size": 14},
"x": 0.225,
"y": 1.05,
"showarrow": False,
"text": "1st Week of October 2015",
"xref": "paper", "yref": "paper"
},
{
"font": {"size": 14},
"x": 0.775,
"y": 1.05,
"showarrow": False,
"text": "1st Week of October 2016",
"xref": "paper", "yref": "paper"
}
]
)
# Adjust the subplot titles
fig.update_xaxes(title_text="Percentage", row=1, col=1)
fig.update_yaxes(title_text="Count Type", row=1, col=1)
# Display the combined plot
fig.show()
Additionally, there is an apparent difference in the themes of broadcasted news and other online discussions between 2015 and 2016 (Figure 6). Specifically, the data shows a significant increase in reports related to violence and crime following the onset of the Duterte administration. The "CRISISLEX_T03_DEAD" theme, associated with reports of deaths, saw the highest increase, with 10.83% of the total mentions in the first week of October 2016, compared to 14.62% of the total mentions in the same period of 2015.
Other themes such as "WB_2432_FRAGILITY_CONFLICT_AND_VIOLENCE" and "WB_2433_CONFLICT_AND_VIOLENCE" also saw substantial increases, with 10.52% and 10.21% of the total mentions respectively in 2016. These themes indicate heightened attention to issues of fragility and conflict, reflecting the turbulent environment during the Duterte administration's early months. The government's hardline policies towards illegal drug trade and the consequential social unrest have likely contributed to the increased focus on these themes.
Notably, the theme "LEADER" appeared with 10.03% of the total mentions in 2016 but was absent in 2015, suggesting increased focus on leadership-related discussions. This shift could be linked to the new administration's leadership style and policy decisions, which have sparked extensive debate and media coverage. Similarly, the theme "WB_621_HEALTH_NUTRITION_AND_POPULATION" was prominent in 2016 with 10% of the total mentions, compared to its absence in 2015, possibly reflecting increased concerns about public health and population issues amidst the administration's controversial policies.
Other emerging themes in 2016, such as "USPEC_POLITICS_GENERAL1" and "ARMEDCONFLICT," with 9.96% and 9.69% mentions respectively, further underscore the significant changes in the socio-political landscape under the Duterte administration. These themes highlight the broader political dynamics and instances of armed conflict that have become more pronounced in the media and public discourse during this period.
result_2015 = (df_ph_2015
.withColumn('Theme', F.explode('Themes_List'))
.filter((F.col('CountType') == 'KILL') &
(F.col('Country') == 'Philippines') &
(~F.col('Theme').startswith('TAX')) &
((F.col('Theme') != '')) &
(F.col('Theme') != 'KILL')
)
.groupBy('Theme')
.count()
.orderBy('count', ascending=False)
.limit(10).toPandas()
)
# Calculate total counts
total_count_2015 = result_2015["count"].sum()
total_count_2016 = result["count"].sum()
# Calculate percentages
result_2015["percentage"] = (result_2015["count"] / total_count_2015) * 100
result["percentage"] = (result["count"] / total_count_2016) * 100
# Sort data by count
result_asc_2015 = result_2015.sort_values(by="count", ascending=True)
result_asc_2016 = result.sort_values(by="count", ascending=True)
# Extract max percentages
max_percentage_2015 = result_asc_2015["percentage"].max()
max_percentage_2016 = result_asc_2016["percentage"].max()
colors_2015 = ["#C8C5C5" if percentage != max_percentage_2015 else '#000000'
for percentage in result_asc_2015["percentage"]]
colors_2016 = ["#C8C5C5" if percentage != max_percentage_2016 else '#880808'
for percentage in result_asc_2016["percentage"]]
# Creating a subplot with two plots side by side
fig = make_subplots(
rows=1, cols=2,
subplot_titles=("1st Week of October 2015", "1st Week of October 2016"),
horizontal_spacing=0.5 # Adjust this value to increase spacing
)
# Adding the 2015 bar chart to the first subplot
fig.add_trace(
go.Bar(
y=result_asc_2015["Theme"],
x=result_asc_2015["count"],
marker=dict(color=colors_2015),
text=result_asc_2015["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation='h',
showlegend=False
),
row=1, col=1
)
# Adding the 2016 bar chart to the second subplot
fig.add_trace(
go.Bar(
y=result_asc_2016["Theme"],
x=result_asc_2016["count"],
marker=dict(color=colors_2016),
text=result_asc_2016["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation='h',
showlegend=False
),
row=1, col=2
)
# Updating layout properties for both subplots
fig.update_layout(
title={
'text': (
"Comparison of KILL-related Themes Between Early October 2015 and "
"2016<br>"
"<sub>There was a significant increase in reports of violence and "
"crime following the onset of the Duterte administration.</sub>"
),
'y': 0.95,
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20}
},
title_font_size=16,
xaxis_title="Count",
yaxis_title="Theme",
margin=dict(t=100),
width=1000,
height=400,
template="plotly_white",
annotations=[
{
"font": {"size": 14},
"x": 0.225,
"y": 1.05,
"showarrow": False,
"text": "1st Week of October 2015",
"xref": "paper", "yref": "paper"
},
{
"font": {"size": 14},
"x": 0.775,
"y": 1.05,
"showarrow": False,
"text": "1st Week of October 2016",
"xref": "paper", "yref": "paper"
}
]
)
# Adjusting the subplot titles
fig.update_xaxes(title_text="Count", row=1, col=1)
fig.update_xaxes(title_text="Count", row=1, col=2)
# Display the combined plot
fig.show()
Which objects are associated with documents that have KILL CountType?
Figure 7 shows the common objects associated with the news about killings in 2015 and 2016, such as suspects and victims. However, many of the words also had connections to illegal drugs, like dealers and addicts. Surprisingly, the largest word in the cloud was "Jews." Upon researching, it was discovered that this was in reference to a statement made by Rodrigo Duterte during an interview. He compared his war on drugs to Hitler's genocide against the Jewish people. This sparked outrage and condemnation globally. Rodrigo Duterte's controversial statement drew intense criticism and backlash from nations, organizations, and human rights groups around the world (Holmes, 2017). Comparing this word cloud from 2015 to the previous one, the difference is night and day. The words in the 2015 word cloud are not associated with drugs at all. It is mostly comprised of common words about people.
def generate_wordcloud(df, year, colormap):
"""
Generate a word cloud from a DataFrame for a specific year and colormap.
Parameters
----------
df : pyspark.sql.DataFrame
The DataFrame containing the data to be processed.
year : str
The year for which the word cloud is generated.
colormap : str
The colormap to be used for the word cloud.
Returns
-------
WordCloud
A WordCloud object generated from the frequencies of 'Object'
types for CountType 'KILL'.
"""
result = (
df.withColumn("Person", F.explode("Persons_List"))
.withColumn("Object", F.split(F.col("V1COUNTS"), "#").getItem(2))
.filter(
(F.col("Object").isNotNull()) &
(F.col("Object") != "") &
(F.col("CountType") == "KILL")
)
.groupBy("Object")
.count()
.orderBy("count", ascending=False)
)
pdf_result = result.toPandas()
word_freq = dict(zip(pdf_result["Object"], pdf_result["count"]))
wordcloud = WordCloud(
width=800, height=400, background_color="white",
colormap=colormap, random_state=26
).generate_from_frequencies(word_freq)
return wordcloud
# Generate word clouds for 2016 and 2015
wordcloud_2016 = generate_wordcloud(df_ph, "2016", "Reds")
wordcloud_2015 = generate_wordcloud(df_ph_2015, "2015", "RdGy")
# Plot the word clouds in a single plot with two columns
fig, axes = plt.subplots(1, 2, figsize=(20, 10), dpi=300)
axes[1].imshow(wordcloud_2016, interpolation="bilinear")
axes[1].axis("off")
axes[1].set_title("WordCloud of Object Types for CountType 'KILL' in 2016")
axes[0].imshow(wordcloud_2015, interpolation="bilinear")
axes[0].axis("off")
axes[0].set_title("WordCloud of Object Types for CountType 'KILL' in 2015")
plt.show()
Interestingly, and perhaps not surprisingly, Figure 8 shows that the same common "KILL"-related entities in 2016 were also the most associated to the documents mentioning the name "Rodrigo Duterte".
result = (df_ph.withColumn("Person", F.explode("Persons_List")).filter(
F.col("Person") == "rodrigo duterte").withColumn(
"Object",
F.split(F.col("V1COUNTS"), "#").getItem(2)).filter(
(F.col("Object").isNotNull()) & (F.col("Object") != "") &
(F.col("CountType") == "KILL")).groupBy("Object").count().orderBy(
"count", ascending=False))
pdf_result = result.toPandas()
# Prepare data for word cloud
word_freq = dict(zip(pdf_result["Object"], pdf_result["count"]))
# Generate the word cloud
wordcloud = WordCloud(width=1000,
height=200,
background_color="white",
colormap="RdGy",
random_state=26).generate_from_frequencies(word_freq)
# Display the word cloud
plt.figure(figsize=(10, 5), dpi=300)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title(
"WordCloud of Object Types for CountType 'KILL'"
" in 2016 filtered with Rodrigo Duterte in Persons"
)
plt.show()
Our observations so far suggest that the nature of news and discussions in the Philippines drastically changed with the ascension of the Duterte Administration to power.
But to further substantiate this interpretation, we have to paint a picture of the global landscape to determine whether this shift results from new political leadership or is simply a consequence of global trends.
For this purpose, we will narrow our focus to countries neighboring the Philippines, specifically in Southeast Asia.
What are the CountTypes of neighboring SEA countries in 2016?
Previously, we found that the number of "KILL"-related discussions nearly doubled from 2015 to 2016 during the first week of October in the Philippines. To provide a broader perspective and assess the significance of this finding, let's examine how this figure compares to the corresponding numbers in neighboring countries of the Philippines during the same period. Additionally, we'll investigate how the Philippines compares to its neighbors in other count types.
Figure 9 shows that among Southeast Asian countries, the Philippines ranked second in killings and first in kidnappings. It had one of the lowest numbers of protests, similar to Laos and Singapore. It could be surmised that this low protest rate could be due to the fact that the Filipino people approved of Duterte's approach to solving the drug war (Office of the Communications Secretary, 2016). On the other hand, a low number of protests could also be indicative of the fear of violence during the drug war.
Note that for some count types, the neighboring countries reported no such types (e.g., POVERTY) during the week considered, which is why only the Philippines had corresponding visualizations.
# Calculate the total count per country across all count types
total_counts_per_country = (df_sea_ph.filter(F.col(
"CountType").isNotNull()).groupBy("Country").count().withColumnRenamed(
"count", "total_count"))
# Join the total counts to the original DataFrame
df_sea_ph_with_totals = df_sea_ph.join(total_counts_per_country,
on="Country",
how="left")
# Filter and group by 'CountType', then convert to Pandas DataFrame
result = (df_ph.filter(
F.col("CountType").isNotNull()).groupBy("CountType").count().orderBy(
"count", ascending=False).toPandas())
# Extract the unique 'CountType' values into a list
list_counttype_ph = result["CountType"].to_list()
dataframes = {}
# Create a DataFrame for each CountType
for ct in list_counttype_ph:
dataframes[ct] = df_sea_ph_with_totals.filter(F.col("CountType").rlike(ct))
# Initialize a dictionary to hold the pandas DataFrames
pandas_dfs = {}
for ct, df in dataframes.items():
grouped_df = (df.filter(
F.col("Country").isNotNull()).groupBy("Country").count().orderBy(
"count", ascending=False))
# Join the total counts to calculate the percentages
grouped_df = grouped_df.join(total_counts_per_country,
on="Country",
how="left")
grouped_df = grouped_df.withColumn(
"percentage", (F.col("count") / F.col("total_count")) * 100)
# Store the resulting pandas DataFrame in the dictionary
pandas_dfs[ct] = grouped_df.toPandas()
# Determine the new grid size (4 rows, 3 columns)
rows, cols = 4, 3
# Create subplot titles for each count type
subplot_titles = [f"{ct}" for ct in pandas_dfs.keys()]
# Initialize subplots with the new grid and increased spacing
fig = make_subplots(
rows=rows,
cols=cols,
subplot_titles=subplot_titles,
vertical_spacing=0.15, # Adjust to increase vertical spacing
horizontal_spacing=0.15, # Adjust to increase horizontal spacing
)
# Loop over each count type and its DataFrame
for idx, (ct, df) in enumerate(pandas_dfs.items()):
# Sort by 'count' in ascending order to display the
# most frequent at the top
result = df.sort_values(by="percentage", ascending=True)
# Define colors, setting the Philippines to red and all others to gray
colors = [
"#880808" if country == "Philippines" else "#C8C5C5"
for country in result["Country"]
]
# Determine the subplot row and column indices
row = idx // cols + 1
col = idx % cols + 1
# Add a horizontal bar trace with dynamic colors to the appropriate subplot
fig.add_trace(
go.Bar(
x=result["percentage"],
y=result["Country"],
marker=dict(color=colors),
text=result["percentage"].apply(lambda x: f"{x:.2f}%"),
orientation="h",
showlegend=False,
),
row=row,
col=col,
)
# Set the overall plot layout
fig.update_layout(
title={
"text": ("A Week in Review: Count Types in Early "
"October 2016 in SEA Countries<br>"
"<sub>Showing counts for different count types</sub>"),
"y": 0.97, # Adjusts the vertical alignment of the title
"x": 0.5,
"xanchor": "center",
"yanchor": "top",
"pad": {
"b": 20
}, # Adjusts the padding below the title
},
title_font_size=16,
width=1200,
height=1200, # Increased height for better spacing
template="plotly_white",
)
# Adjust individual subplot axis titles
for idx in range(1, rows * cols + 1):
fig.update_xaxes(title_text="Count",
row=(idx - 1) // cols + 1,
col=(idx - 1) % cols + 1)
fig.update_yaxes(title_text="Country",
row=(idx - 1) // cols + 1,
col=(idx - 1) % cols + 1)
# Show the final grid plot
fig.show()
What is the mean overall tone of documents for SEA Countries in 2015 and 2016?
In Figure 10, we see that most Southeast Asian countries had a negative overall tone in the documents presented, except for Brunei, which had a positive tone in both 2015 and 2016. The overall tone for the Philippines worsened from 2015 to 2016, reflecting mostly negative news. Indonesia was an exception, showing a slight improvement in tone.
result = (df_sea_ph
.withColumn("overall_tone",
F.split(F.col("`V1.5TONE`"), ",")
.getItem(0).cast("float")
)
.withColumn("pos_tone", F.split(F.col("`V1.5TONE`"), ",")
.getItem(1).cast("float")
)
.withColumn("neg_tone", F.split(F.col("`V1.5TONE`"), ",")
.getItem(2).cast("float")
)
.groupBy('Country')
.agg(F.mean('overall_tone').alias('mean_overall_tone'),
F.mean('pos_tone').alias("mean_pos_tone"),
F.mean('neg_tone').alias("mean_neg_tone")
)
.orderBy('mean_overall_tone')
)
result_2015 = (df_sea_ph_2015
.withColumn("overall_tone",
F.split(F.col("`V1.5TONE`"), ",")
.getItem(0).cast("float")
)
.withColumn("pos_tone", F.split(F.col("`V1.5TONE`"), ",")
.getItem(1).cast("float")
)
.withColumn("neg_tone", F.split(F.col("`V1.5TONE`"), ",")
.getItem(2).cast("float")
)
.groupBy('Country')
.agg(F.mean('overall_tone').alias('mean_overall_tone'),
F.mean('pos_tone').alias("mean_pos_tone"),
F.mean('neg_tone').alias("mean_neg_tone")
)
.orderBy('mean_overall_tone')
)
# Convert Spark DataFrames to Pandas
df_2016 = result.toPandas()
df_2015 = result_2015.toPandas()
# Ensure both dataframes are sorted by country (if not already sorted)
df_2016 = df_2016.sort_values(by="Country")
df_2015 = df_2015.sort_values(by="Country")
# Functions to apply conditional coloring
def get_2016_colors(
df,
highlight_country="Philippines",
highlight_color="#880808",
default_color="black",
):
return [
highlight_color if country == highlight_country else default_color
for country in df["Country"]
]
def get_2015_colors(
df,
highlight_country="Philippines",
highlight_color="#EB9E9E",
default_color="#C8C5C5",
):
return [
highlight_color if country == highlight_country else default_color
for country in df["Country"]
]
# Create the figure
fig = go.Figure()
# Add the 2015 bar trace with conditional coloring
fig.add_trace(
go.Bar(
x=df_2015["Country"],
y=df_2015["mean_overall_tone"],
name="Mean Overall Tone (2015)",
marker_color=get_2015_colors(df_2015),
)
)
# Add the 2016 bar trace with conditional coloring
fig.add_trace(
go.Bar(
x=df_2016["Country"],
y=df_2016["mean_overall_tone"],
name="Mean Overall Tone (2016)",
marker_color=get_2016_colors(df_2016),
)
)
# Update layout for grouped bar chart
fig.update_layout(
title={'text':(
"Tone Analysis by Country (2015 vs. 2016)<br>"
"<sub>Showing the Mean Overall Tone for different "
"SEA Countries for 2015 and 2016.</sub>"
),
'y': 0.95,
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20}
},
xaxis_title="Country",
yaxis_title="Mean Overall Tone",
barmode="group",
width=1000,
height=600,
template="plotly_white",
)
# Show the plot
fig.show()
Who is the most talked about person in the first of October 2016?
Three months after his inauguration, Rodrigo Duterte remains a major topic (Figure 11), especially due to his controversial statement comparing his anti-drug campaign to Hitler's massacre of Jews. Politicians like Panfilo Lacson and Barack Obama, prominent figures like Pope Francis, and celebrities like Catriona Gray and Mark Anthony Fernandez are also frequently mentioned. Meanwhile, in 2015, Vice-Presidential candidates, such as Leni Robredo, Jejomar Binay, Ferdinand Marcos Jr., were most talked about.
def generate_wordcloud(df, colormap, year):
"""
Generate a word cloud from a DataFrame for a specific year and colormap.
Parameters
----------
df : pyspark.sql.DataFrame
The DataFrame containing the data to be processed.
colormap : str
The colormap to be used for the word cloud.
year : str
The year for the title of the word cloud.
Returns
-------
WordCloud
A WordCloud object generated from the frequencies of 'Person'.
"""
result = (
df.withColumn('Person', F.explode(F.col('Persons_List')))
.groupBy('Person')
.count()
.orderBy('count', ascending=False)
)
pdf_result = result.toPandas()
word_freq = dict(zip(pdf_result["Person"], pdf_result["count"]))
wordcloud = WordCloud(
width=800, height=400, background_color="white", colormap=colormap,
random_state=26
).generate_from_frequencies(word_freq)
return wordcloud, year
# Generate word clouds for 2016 and 2015
wordcloud_2016, year_2016 = generate_wordcloud(df_ph, "Reds_r", "2016")
wordcloud_2015, year_2015 = generate_wordcloud(df_ph_2015, "Reds_r", "2015")
# Plot the word clouds in a single plot with two columns
fig, axes = plt.subplots(1, 2, figsize=(20, 10), dpi=250)
axes[1].imshow(wordcloud_2016, interpolation="bilinear")
axes[1].axis("off")
axes[1].set_title(f"WordCloud of Persons in the Philippines in {year_2016}")
axes[0].imshow(wordcloud_2015, interpolation="bilinear")
axes[0].axis("off")
axes[0].set_title(f"WordCloud of Persons in the Philippines in {year_2015}")
plt.show()
What are the themes of documents associated to Rodrigo Duterte in 2016?
The top 10 unique themes in early October 2016 were Leader, Politics, Fragility, Conflict, and Violence, all associated with Rodrigo Duterte. These themes reflect the nature of news at that time and likely contributed to the decline in overall tone from 2015 to 2016.
# For the plot below
result = (df_sea_ph
.filter(F.col('Country') == 'Philippines')
.withColumn('Theme', F.explode(F.col('Themes_List')))
.withColumn('Person', F.explode(F.col('Persons_List')))
.filter(F.col('Person') == 'rodrigo duterte')
.filter(F.col('Theme') != "")
.groupBy('Theme')
.count()
.filter(~F.col('Theme').startswith('TAX'))
.orderBy('count', ascending=False)
.limit(10)
)
# Convert to pandas DataFrame and sort
pdf_duterte_themes = result.toPandas().sort_values("count", ascending=False)
# Determine the total count
total_count = pdf_duterte_themes["count"].sum()
# Calculate the percentage of each count
pdf_duterte_themes["percentage"] = (
pdf_duterte_themes["count"] / total_count) * 100
# Determine the maximum percentage
max_percentage = pdf_duterte_themes["percentage"].max()
# Define colors, highlighting only the maximum percentage theme
colors = [
"#880808" if percentage == max_percentage else "#C8C5C5"
for percentage in pdf_duterte_themes["percentage"]
]
# Initialize the figure
fig = go.Figure()
# Add a bar trace
fig.add_trace(
go.Bar(
x=pdf_duterte_themes["percentage"],
y=pdf_duterte_themes["Theme"],
orientation="h",
marker=dict(color=colors),
text=pdf_duterte_themes["percentage"].apply(
lambda x: f"{x:.2f}%"
), # Format as percentage
)
)
# Update layout
fig.update_layout(
title={'text':(
"Top 10 Unique Themes Percentage for Rodrigo Duterte<br>"
"<sub>Showing the Top 10 Unique Themes for Rodrigo Duterte.</sub>"
),
'y': 0.95,
'x': 0.5,
'xanchor': 'center',
'yanchor': 'top',
'pad': {'b': 20}
},
xaxis_title="Unique Themes Percentage",
yaxis_title="Theme",
yaxis=dict(categoryorder="total ascending"),
width=1000,
height=600,
template="plotly_white",
)
# Show the plot
fig.show()
RESULTS & DISCUSSION
Our findings underscore the dramatic shift in the focus and nature of reported events in the Philippines under the Duterte administration, which reflects a significant rise in violence and crime-related incidents and a more aggressive approach to law enforcement compared to the previous year and neighboring countries:
Finding 1: On the nature and scale of events reported during the first week of October 2016
The data from the first week of October 2016, shortly after the Duterte administration took office, reveals a potentially alarming situation in the Philippines. The most notable finding is the exceptionally high number of "KILL" events, accounting for 60.84% of the reports, indicating significant media coverage on killings or deaths during this period. Other significant events included 18.69% "ARREST" reports, 8.83% "KIDNAP" reports, 8.53% "AFFECT" reports, and 3.11% "WOUND" reports. This distribution suggests a strong focus on law enforcement actions, abductions, injuries, and incidents with broader societal impacts.
Finding 2: On the comparison to the correspoding period in the previous year (2015) before Duterte took office
When compared to the same week in the previous year (2015), there was a marked increase in reported killings, which almost doubled three months after the Duterte Administration began. This increase was largely associated with drug-related crimes, highlighting terms such as "drug suspects," "drug addicts," and "drug personalities." The period also saw frequent mentions of "Jews" due to Duterte's controversial comparison of his anti-drug campaign to Hitler's genocide. In contrast, "KILL"-related discussions in 2015 were more generalized, focusing on non-drug-related victims such as "person," "students," and "siblings."
Finding 3: On the comparison to neighboring countries in Southeast Asia
In the context of Southeast Asia, the Philippines ranked second for "KILL"-type reports during the same period, indicating a relatively high incidence of reported killings compared to its neighbors. It had one of the lowest numbers of protests, similar to Laos and Singapore. This low protest rate could reflect the high approval to Duterte's anti-drug campaign or an indication of fear of violence during the drug war.
This comparison highlights the particularly severe situation in the Philippines in relation to violence and law enforcement activities during the early period of the Duterte administration.
CONCLUSION
As these results reflect the actual truth, we find that the resulting prototype model of this study has the potential to develop into a full-fledged monitoring system that can support the efforts of our law enforcement agencies to uphold the rule of law and protect the rights of all Filipino citizens. It also follows that the model has the potential to supplement the initiatives of the national government to reduce crimes. For instance, it would immensely help the PNP in allocating their efforts if there are is monitoring system that tracks the number of certain types of crimes (e.g., number of reported kidnappings, theft, homicide) per city.
It is equally important to emphasize that the potential of this prototype depends on the quality, completeness, and accuracy of the data it uses. Our exploration is a testament to the reliability of the GKG dataset as a comprehensive catalog of nationwide events and broadcasted knowledge in the Philippines, at least in the period considered. However, whether or not this holds true on a much larger geographic scale (say, worldwide) and wider time frame (say, year-on-year comparison) remains an unchartered territory.
SCOPE & LIMITATIONS
This study is focused on a specific timeframe and acknowledges certain limitations due to constraints in time and computational resources. The analysis covered data from a one-week period, three months after Rodrigo Duterte's inauguration. For comparative purposes, data from the corresponding week in the previous year was examined. Additionally, to provide a broader regional context, the study includes data from other Southeast Asian (SEA) countries.
While these constraints were necessary to ensure the feasibility of the analysis within the given timeframe and available resources, it is important to note that the scope of the study does not capture the full complexity of the situation. The one-week snapshot may not fully represent the trends and patterns over a longer period. Moreover, the regional comparison, while valuable, may also be influenced by differing media reporting standards and law enforcement practices in each country.
Future research should aim to expand the temporal scope of the analysis to include longer periods before and after Duterte's inauguration. It would be interesting to see how Duterte's administration differ from Marcos Jr.'s government, especially in the context of combatting illegal trade in the country. Despite these limitations, the current study provides a critical snapshot of the significant changes in the nature and scale of reported events in the Philippines during the early months of the Duterte administration.
WAY FORWARD
For future research, other than covering a longer time period, it is recommended to include top-performing countries to help identify prevalent events and global concerns. Additionally, contrasting these findings with an analysis of countries with lower GDP could also provide valuable insights into the socio-economic factors influencing these events.
A more comprehensive analysis could also be achieved by supplementing the existing data with additional information such as geographical coordinates like Nominatim or OpenStreetMap (OSM) data. Incorporating geographical data would enhance the depth and scope of the insights derived from the analysis, allowing for more precise location-based trends and patterns to be identified. Furthermore, other GDELT datasets such as the GDELT Event Database and GCAM could further substantiate the findings from the GKG dataset.
GLOSSARY
This is a glossary of count types and themes mentioned in the results of the paper. Note that this is not comprehensive due to some terms not having available definitions on the web, but they are intuitive enough.
| Term | Type | Description |
|---|---|---|
| AFFECT | Count | This broad category captures everything from being sickened to refugees, evacuations, displaced persons, stranded, etc. |
| ARREST | Count | Discussion of someone being arrested, detained, jailed, imprisoned, etc. |
| DISPLACED | Count | This category counts mentions of people being displaced - see REFUGEES for counts of refugees, forced migration, and related |
| EVACUATION | Count | Mentions of evacuations |
| KIDNAP | Count | Someone being kidnapped, abducted, hostages, etc. |
| KILL | Count | Any mention of something dying |
| PROTEST | Count | Discussion of protesting, demonstrating, rioting, striking, activists, agitators, etc. |
| REFUGEES | Count | Refugees, displaced persons, forced migration, asylum seekers |
| SEIZE | Count | Something being seized (often drugs, illegal materials, etc.) |
| SICKENED | Count | This category counts anything being sickened |
| WOUND | Count | Any mention of something being wounded or injured |
| POVERTY | Theme | Poverty, homeless, destitute |
| CRISISLEX_T03_DEAD | Theme | Casualties (deceased) due to the crisis |
| LEADER | Theme | Political elites, such as lawmakers, presidents, supreme leaders, etc. |
| CRISISLEX_C07_SAFETY | Theme | Protection of people/property against harm such as violence or theft |
REFERENCES
Catholic News Service. (2016, October 5). Philippine bishops’ leader ‘in endless grief’ over drug war. Retrieved May 15, 2024, from https://catholicphilly.com/2016/10/news/world-news/philippine-bishops-leader-in-endless-grief-over-drug-war/
CrisisLex Taxonomies Now Available in GKG – The GDELT Project. (n.d.). Retrieved May 15, 2024, from https://blog.gdeltproject.org/crisislex-taxonomies-now-available-in-gkg/
Flores, H. (2018, September 23). 78% of Pinoys satisfied with drug war – SWS. Philstar.com. Retrieved May 15, 2024, from https://www.philstar.com/headlines/2018/09/24/1854162/78-pinoys-satisfied-drug-war-sws
GDELT 2.0: Our Global World in Realtime – The GDELT Project. (n.d.). Retrieved May 15, 2024, from https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/
Global Knowledge Graph Category List. (n.d.). The GDELT Project. Retrieved May 15, 2024, from https://view.officeapps.live.com/op/view.aspx?src=http%3A%2F%2Fdata.gdeltproject.org%2Fdocumentation%2FGDELT-Global_Knowledge_Graph_CategoryList.xlsx&wdOrigin=BROWSELINK
Holmes, O. (2017, November 28). Rodrigo Duterte vows to kill 3 million drug addicts and likens himself to Hitler. The Guardian. Retrieved May 15, 2024, from https://www.theguardian.com/world/2016/sep/30/rodrigo-duterte-vows-to-kill-3-million-drug-addicts-and-likens-himself-to-hitler
McKirdy, E. (2016, September 30). Rodrigo Duterte vows to kill 3 million drug addicts and likens himself to Hitler. The Guardian. Retrieved May 15, 2024, from https://www.theguardian.com/world/2016/sep/30/rodrigo-duterte-vows-to-kill-3-million-drug-addicts-and-likens-himself-to-hitler
Office of the Communications Secretary. (2016, December 19). News releases: Pinoys satisfied with drug war, believe crime went down; Duterte plays Santa to PSG. Office of the President of the Philippines. Retrieved May 15, 2024, from https://pco.gov.ph/december-19-2016-news-releases/
Philippine Politics Under Duterte: A Midterm Assessment. (n.d.). Carnegie Endowment for International Peace. Retrieved May 15, 2024, from https://carnegieendowment.org/research/2019/01/philippine-politics-under-duterte-a-midterm-assessment?lang=en
Philippines drugs war: UN votes to investigate killings. (2019, July 11). Retrieved May 15, 2024, from https://www.bbc.com/news/world-asia-48955153
Suarez, K. (2016, August 7). The Duterte list: Judges, mayors, police officials linked to drugs. RAPPLER. Retrieved May 15, 2024, from https://www.rappler.com/nation/142210-duterte-list-lgu-police-officials-linked-drugs/
Temnikova, I., Castillo, C., & Vieweg, S. (2015). EMTerms 1.0: A terminological resource for crisis tweets. In Proceedings of the International Conference on Information Systems for Crisis Response and Management (ISCRAM'15). Kristiansand, Norway.
Worley, W. (2016, July 2). Philippines president Rodrigo Duterte tells people to “go ahead and kill” drug addicts | The Independent. The Independent. Retrieved May 15, 2024, from https://www.independent.co.uk/news/world/asia/philippines-president-rodrigo-duterte-tells-people-to-go-ahead-and-kill-drug-addicts-a7116456.html